Comment parser#11252
Conversation
andreabedini
left a comment
There was a problem hiding this comment.
Thank you for taking on this work. I think adding a Comment constructor to Field is not a good design. It modifies the meaning of the type (and indeed this forces you to make functions like elementInLayoutContext return [Fields]).
I think this has been already discussed before: we have the ann parameter which can be used to keep hold on the comments. E.g.
data Comment ann = Comment !ann !ByteString
type FieldWithComments ann = Field ([Comment ann], ann)In this design each Field carries the comments preceding it, annotated with their position. An extra annotation marks the position of the field itself. Any comment at the end of the file would need to be captured separately.
This is already the practice of few packages developed by the community. Is there a reason to deviate from this?
|
Here's a preliminary benchmark done with Upstream: ~/r/haskell/cabal λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 203.400 s ± 21.816 s [User: 150.484 s, System: 39.487 s]
Range (min … max): 183.805 s … 277.905 s 30 runsThis branch: …/wt/haskell/cabal/exact-pp-leana λ lts-18.28
$ hyperfine --runs 30 './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 199.168 s ± 10.540 s [User: 156.373 s, System: 39.818 s]
Range (min … max): 184.443 s … 242.563 s 30 runsThank you for your response Andrea, I'll write up a response and get back to you soon :) |
|
Thank you for your comment @andreabedini :)
That looks very interesting, but how would I deal with files that are just comments? To the point of view of readFields they should be valid yet we would have no I do think your model is very interesting so if you have the time to, please show working PR against mine so we can simply merge it in 🙏
Could you elaborate which packages are these? I would love to have more insight on how people solve similar problems. Are there other design issues that needs to be addressed ? |
Are they valid Cabal files if there is nothing but comments? I don't think so.
For instance, annotateFieldsWithSource :: ByteString -> [Field Position] -> [Field ByteString]which annotates each field with its source including all adjacent comments. It can be modified to return |
| match _ = Nothing | ||
|
|
||
| -- | Collect comments into a map. The second field of the output will have no comment | ||
| extractComments :: Ord ann => [Field ann] -> (Map.Map ann ByteString, [Field ann]) |
There was a problem hiding this comment.
Could you possibly outline how the output of this function is supposed to be used? Now that we detached comments from fields, how do we reconstruct the original document? How do we do it if [Field ann] is programmatically updated (say, adding or removing elements)?
There was a problem hiding this comment.
Could you possibly outline how the output of this function is supposed to be used?
This function recursively extracts the comments from the fields (the updated design variant does the same thing, albeit being more concise).
It is used to extract the comments out right before running the parseFieldGrammar parser in src/Distribution/PackageDescription/Parsec.hs. This allows us to not change the grammar and be able to directly inject the comments into the GenericPackageDescription.
Now that we detached comments from fields, how do we reconstruct the original document?
We can't yet. Following Jappie's proposal, we need to store the exact position of each field as a map (called exactPositions) indexed by the path in the rose tree (typed [NameSpace], a list of path segments). And then we will have the right information to construct the exact printer.
How do we do it if [Field ann] is programmatically updated (say, adding or removing elements)?
To quote Jappie:
The issue for addition is that you now have to invent exact positions. for removal, if it involves a line, you've to fix up all following lines, (and it has to know something was removed).
I haven't thought about this thoroughly because it's out of scope of this PR, but it should be very much feasible.
Pretty sure you need at minimum |
|
Hi friends, thanks for all your responses. Leana needs some time to read up on the exact proposal to see how it all fits together before replying. |
|
Cabal is one of the toughest code bases I ever worked on, so I'm quite amazed by Leana making progress so quickly! |
|
Thank you Bodigrim and Jappie for your kind words! That means a lot to me, I'm glad to be on the right track. I have started (and completed) to rewrite my PR using Andrea's approach.
Good to know. Currently the top level parser drops the comments consumed if there are no fields to attach them to. Let me know what you think about the change :) |
|
Here are the benchmark results. The baseline has been rerun because I did these ones on a VPS machine, and they are not comparable to the last ones I ran on my machine. # baseline
leana@Ubuntu-2404-noble-amd64-base:~/cabal$ hyperfine './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 253.353 s ± 9.520 s [User: 196.642 s, System: 60.881 s]
Range (min … max): 241.649 s … 271.207 s 10 runs
# this PR
leana@Ubuntu-2404-noble-amd64-base:~/cabal$ hyperfine --setup "./validate.sh --partial-hackage-tests" "./validate.sh --partial-hackage-tests"
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 253.163 s ± 7.451 s [User: 196.432 s, System: 58.507 s]
Range (min … max): 239.373 s … 266.656 s 10 runs |
In my prototype I have replaced Where the extra annotation is for anything coming after the last field.
In addition to @Bodigrim's
I am available to discuss and support her effort. @leana8959 I'll reach out privately.
I warmly second this! |
|
@leana8959, @andreabedini: how is the private communication going? We are interested too! Could we help somehow? |
|
for clarity: These parser changes are ready for review as far as leana and me are concerned, meanwhile we've moved over to import stanza retention in GenericPackageDescription (they currently get merged into the stanza's). This is an independent change of the parser changes here. After that we can start on exact print propper. |
Please don't. A lot of code wants an elaborated (= stripped down of syntactic convenience) representation of package description, and GPD serves that now. Your parsing changes leak down the pipeline where they shouldn't. E.g. things like solver works with GPD, and it really shouldn't care about whether import stanzas were used to declare a package or not. |
|
This PR isn't about that, |
|
This PR does add , exactComments :: ExactComments Positionfield to GPD. I don't see a point of having comments in GPD. (As noted in tests, it "breaks" equality) |
|
@phadej Indeed, I have added a newtype around |
|
I understand, the authors would like to get reviews for this PR. If this is so, please, squash the commit history. For the size of PR: a good part of the changes are test-suite changes, it seems. I don't think they have to be extracted into separate PR or even separate commits (because having commits that don't pass CI individually may be cumbersome in the future, for git-bisecting and alike). |
|
@mpickering can you take a high-level look at the design in this PR and tell us your opinion? The gist of it is: and then many parsing utilities that used to return -parseGenericPackageDescription'
+parseAnnotatedGenericPackageDescription'
:: Maybe CabalSpecVersion
-> [LexWarning]
-> Maybe Int
- -> [Field Position]
- -> ParseResult src GenericPackageDescription
-parseGenericPackageDescription' scannedVer lexWarnings utf8WarnPos fs = do
+ -> [Field (WithComments Position)]
+ -> ParseResult src AnnotatedGenericPackageDescription
+parseAnnotatedGenericPackageDescription' scannedVer lexWarnings utf8WarnPos fs = do |
|
One particular thing that bothers me is that clients will have to call Any thoughts? Meta-comment: I looked through the tech proposal again, and there doesn't appear to be a technical description of a solution for this particular comments issue. It is totally fine, because the proposal would turn into a foliant at that level of detail. Neveretheless, I wish that, before doing all the technical work in this PR, the authors had discussed the actual technical solution for the particular issue (like preserving comments; others are listed in the tech proposal) on the Cabal bug tracker (here). |
@ulysses4ever Are there some rule of thumb to squash my commits? I already went through the history once this morning and split out all changes that touch the testsuite into one commit. |
26f34ae to
7751130
Compare
|
I cleaned up everything and made sure the test passes, this is ready for review! |
|
@jappeace @leana8959 should we remove the needs-review label for now given #11227 (comment) ? |
|
yes 😄 |
4d72dbd to
fe5ff3c
Compare
|
A checklist on what has been discussed in this thread and their conclusions based on the current design and implementation.
|
b98a51f to
0f472c0
Compare
(NB: I assume Bodigrim meant I'm sorry to say but this outline is not helpful to me. Can you come up with concrete examples of what separated comments may be useful for? Because to me (like Bodigrim, I assume) they seem useless. Same goes for It'd be good to have some indication how this work connects to the more general exact-print work (concretely, #11690 or #11814, whichever is alive; btw it'd be good to garbage collect at some point). |
@ulysses4ever thanks a lot for the review! As a different approach, we can also hide this from the interface for now. This PR will still have its value because it makes the parser ready for future exact print PRs even though this feature is not yet exposed via the interface. I plan to feed the data ParsecFieldGrammar s a = ParsecFG
{ fieldGrammarKnownFields :: !(Set FieldName)
, fieldGrammarKnownPrefixes :: !(Set FieldName)
- , fieldGrammarParser :: forall src. (CabalSpecVersion -> Fields Position -> ParseResult src a)
+ , fieldGrammarParser :: forall src. (CabalSpecVersion -> Fields (WithComments Position) -> ParseResult src a)
}
deriving (Functor)This will allow printing each field in field grammar to be self contained, and easier to reason with. We also avoid the difficulty of correlating comments with modified fields, which was one of the big blockers in the trivia tree approach #11425. |
|
I'm probably way late to the party, but I'm wondering whether it is morally correct to attach comments to fields in the first place. AFAIK, the cabal file format does not explicitly specify that comments are supposed to be associated with whatever non-comment follows; by convention, most "real" comments in practice are, but when, for example, you comment out fields or sections in a cabal file, this is almost certainly not the case (you don't think of a commmented-out field as being associated with whatever field happens to come after it). The fact that comments can occur at the end of a cabal file and need to be handled specially there is, IMO, a symptom of this, and my gut feeling says that this probably means that associating comments with fields like this is not the right design. |
Why not? Note that currently cabal doesn't retain any comments. It is very "correct" because it avoids answering the question, but it also doesn't let us print out the comments. Making comments a proper construction on par with Field and Sections was the previous design. However, as andrea bedini pointed out, it modifies the meaning of the type. Also, distinguishing what comment is a real comment is not in the scope of this PR. We don't do haddock-like generation where we need to know to which AST node the comment is attached to. This PR is only concerned with storing the comments properly and passing them on to be printed back out completely.
(Just to make sure we are on the same page, can you point to me which part of the diff you are talking about?) I think this situation occurs in every language that allows you to put a comment before and after a line. The real problem is that when the cabal file is only comments, we'd have no fields to attach them to. This is discussed here and we agreed that we'd only support valid cabal files, which will have a non null amount of fields/sections of fields. |
This is the first part of the exact print parser. In this PR I changed the lexer so instead of dropping the comments it emits them to the parser which is further stored in
GenericPackageDescription.Please let me know your thoughts!
Checklist below:
This PR modifies behaviour or interface
Include the following checklist in your PR:
significance: significantin the changelog file.